Biological Evolution in a Multidimensional Fitness Landscape 



(N 
O 

o 
d 



(N 



David B. Saakian^'^-^EI Zara Kirakosyan^, and Chin-Kun Hufl 

^Institute of Physics, Academia Sinica, Nankang, Taipei 11529, Taiwan 
^Yerevan Physics Institute, Alikhanian Brothers St. 2, Yerevan 375036, Armenia and 
"^National Center for Theoretical Sciences (North): Physics Division, National Taiwan University, Taipei 10617, Taiwan 

(Dated: December 7, 2012) 

We considered a multi-block molecular model of biological evolution, in which fitness is a function of the 
mean types of alleles located at different parts (blocks) of the genome. We formulated an infinite population 
model with selection and mutation, and calculated the mean fitness. For the case of recombination, we for- 
mulated a model with a multidimensional fitness landscape the dimension of the space is equal to the number 
of blocks) and derived a theorem about the dynamics of initially narrow distribution. We also considered the 
case of lethal mutations. We also formulated the finite population version of the model in the case of lethal 
mutations. Our models, derived for the virus evolution, are interesting also for the statistical mechanics and the 
Hamilton- Jacobi equation as well. 
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I. INTRODUCTION 



The investigation of biological evolution models llj-|5|] is 
one of the most fruitful applications of statistical mechanics 
or theoretical physics to biological problems ll6l [7ll- ll23ll . To 
solve the evolution models, one can apply the whole machin- 
ery of modern theoretical physics: spin-glass phy sics meth- 
ods ifTlll . quantum statistical mechanics iflafTsHlsll . quantum 
field the ory ifTsllTsll . Hamilton-Jacobi equation (optimal con- 
trol) lIMili- 

The genome, a collection of genes with different types, 
could be considered as a particular spin configuration of a 
statistical system, where the fitness (the rate to produce off- 
springs of the given genome) is equivalent to the Hamilto- 
nian of the spin system. In evolution theory, the notion of fit- 
ness is central in defining the general features of evolution or 
in modeling a concrete experiment. Fitness is a complicated 
function of gene content (types of genes) of the genome in se- 
quence space; this function is assumed to have a mean-field 
like behavior. Most of the investigations have been devoted 
to the symmetric fitness case, when there is a master (refer- 
ence) sequence, and fitness (energy) is a simple function of the 
Hamming distance from that sequence j^j- In ifisll . a general- 
ization of symmetric fitness landscape was considered, when 
there are some K reference sequences, and the fitness was a 
function of K Hamming distances from these reference se- 
quences. In I24h26i1 . there were suggested evolution models 
where the genome consisted of different blocks and the fit- 
ness is a function of the gene mean types at different blocks. 
In the current article, we follow the idea of 1240 . considering 
an infinitely long genome, a collection of a finite number of 
blocks, defining mean "magnetizations" at any such block and 
the fitness as a function of block magnetizations. We then use 
the Hamilton-Jacobi equation iTgf l to solve the equation. This 
new approach is technically easier than that used in ifisll . Thus 



in the present paper, we can calculate the mean fitness of a re- 
combination model in a multidimensional fitness landscape. 

Recombination is one of the key factors in evolution. The 
mathematical aspects of recombination were analyzed in iIztI - 
I29I1 . Recentl y, th ere was good progress in the statics of recom- 
bination l22ll23[l and there was some advance in the dynamics 
ifsoll . We will formulate the recombination model in a mul- 
tidimensional fitness landscape for many-loci haploid model 
with two alleles (type of gene) at any locus (position of a gene 
in the genome). 

The rest of the paper is organized as follows: In sec- 
tion II, we formulate and solve (calculate the mean fitness) 
the evolution model with selection and mutation in a multi- 
dimensional fitness landscape, including the case of lethal mu- 
tations 1 3 li 32] . We consider 2 block models for the lethal mu- 
tations and an asymmetric initial distribution. In section III, 
we formulate the recombination model in a multidimensional 
space. While we could not calculate the mean fitness, we de- 
rive a general result regarding the dynamics of population for 
the initial narrow distribution. In Sec. IV, we summarize our 
results and discuss problems for further research. 



II. THE MULTIDIMENSIONAL MODEL 
A. The Model 

We identify the alleles as spins and consider the genome 
as a collection of L spins taking the values ±1. In the peak 
configuration, all spins take value +\. Our model is a sim- 
ple generalization of the Crow-Kimura model The 
genome is a collection of H pieces (blocks), with the length 
Ln,l < n < H, such that X]^=i -^n = L. 

Any sequence is characterized by ^i, . . . , Lh, the number 
of "-" (negative) spins in the blocks. We introduce the "mag- 
netization" m„, defined as 
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at the n-th piece of genome for all of ?7 with 1 < n < H. 
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Our fitness r is a function of (li, . . . , Ih)- Thus, we define 
fii,...,iH = Lf{mi, . . . ,mfj). The discrete variables /„ are 
defined in the interval [0,i„], while the magnetizations to„ 
are becoming continuous at the limit N ^ (x and —1 < 
m„ < 1. We define the function /(mi, . . . , m//) as a fitness 
function. 

The description of the mutation process is the principal 
point in the definition of the model. In order to describe mu- 
tations we use the coefficients a;„±(Z„, L„): 



J-in J-ir. 



(2) 



where L„ denotes the length of the n-th piece, Xn+{ln, Ln) 
and x„_(Z„,L„) are the fractions of — and + spins in the 
n-th piece. 

If the initial distribution of the population is symmetric, i.e. 
all the sequences with the same /„ having the same probabil- 
ity, we describe the system through the p{li, . . . , Ih, t), the 
probability of all sequences having h, . . . ,Ih minus spins in 
corresponding blocks. Then we write the following system of 
equations: 

Ti = ['rh.....i„ - LR)p{li, ...,lH,t) 

at 

-p{li,...,lH,t) 

+ ^ LnXp{ln,L„)p{h, ■ ■ ■ ,ln - ■ ■ ■ ,lH,t), 



I3=±l. n 

0<i„<L„ 



p{li, . . ■ ,lH,t)ri^,...,iH, 



(3) 



The sum over n extends from 1 to iJ and R and is the mean 
fitness. We considered mutations independently at all pieces 
of the genome; thus, in the middle line at the right hand side 
of Eq. (O we changed an In at the position n, where n 
changes from 1 (first piece of genome) until H (the last piece 
of genome). 

The current configuration (/i, . . . , . . . , Ih) could be ob- 
tained from either {li, ...,/„ + !,..., Ih), reversing one of 
Z„ + 1 " — " spins, or from the (Zi, — 1, ... , Ih) config- 

uration reversing one of (L„ — Z„ + 1) " + " spins. There are 
(In — l) such possibilities for the first case, and (L„ — ln + 1) 
for the second case. Dividing by L, we derived the coefficients 
x^{ln + l,Ln) and x-^-{ln — l,iri) in Eq.©. For H = 1, 
Eq.® coincides with the Crow-Kimura model fil [l3l[l6ll . 

Let us consider the linear part of the latter equation, and 
write an equation for P{mi, . . . , niH, t) = p{li, . . . ,lH,t) 

dP{mi, . . .,mH,t) 



dt 

L{f{m,i, . 



mn) - ^)P{mi, . . .,mH,t) 
1 + /?m„ P - Is 



P=±l,l<n<H 



xP{mi 



— ,mH,t). 



(4) 



Following II33L 13411 . we define the mean fitness in the steady 
state of Eq.© as the largest eigenvalue of the quadratic matrix 
on the left hand side of Eq.©. 



Following 111911 . we assume an anzats: 

P{mi,...,mH,t) ^ exp[Lu{mi, . . . ,mH,t)]. (5) 

Then with 1/L accuracy we get the following HamUton- 
Jacobi equation (HJE): 

du(mi, . . . jITIh) tt/ 9u du . 

— = -H(mi, . . . , jtih; ^ — , . . . , ^ ), 



dt 



-H{mi, . . .,mH;Pi, ■ . ■,Ph) = f{rni, ■ . . jTOh) 



-1+ E ^( 



l<ri<ff 



L ' 2 



(6) 



where we missed 0{1/L) terms and introduced the momen- 

tums Pn = du/dnin- 

Consider the asymptotic solution: 

u{mi,...,mH,t) = Rt + un{mi,...,mH), (7) 

we get an equation 

R = /(toi, . . . ,toh) - 1 



E 

l<n<H 



L 2 



L„ 1 - m„ -2 ^"°'"'i' --"'"'-- 
L 2 ^ 



(8) 



On the other hand, we have a condition that at any point m, our 
R should be higher than the minimum of the right hand side, 
considered as a function of momentums '^"o(™i,-^:™n, 
We define 

U{mi, . . . , mn) = min[f{mi, . . . , m/f) — 1+ 



L„ 1 + m„ 2 ^""'"'i' 



^ ^ L 2 

l<n<H 



L„ 1 - m„ ^2 °""'"'!' 



■Iff) 



L 2 ' 

Examining the solution of the minimum problem and looking 
at different points m, we find: 

R > max[U{mi, . . ■ ,mn)]\rm,...,mH , 
U{mi,...,mH) = /(mi,...,m//) - 1 



l<n<H 



L 



(9) 



In Eq.(9) we take the maximum in the domain — 1 < m„ < 1. 
The function U{mi, ..niH) is the equivalent of the potential 
in classic mechanics. 

Following lfl9tl . we identify the mean fitness (the maximum 
eigenvalue of the matrix on the right hand side of Eq.©) with 
the lower bound of Eq.®, 



R = max[U{mi, . . . ,mH)]\mi, 



(10) 
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FIG. 1 : The comparison of analytical result (smooth line) with the 
numerics (dots) for the 3-d model with Li — L2 = L3 = 20. 
The whole genome mutation rate is 1. The first part of the genome 
has a fitness /i(mi) — kml/2. In the second part all the muta- 
tions are lethal. For the fitness contribution from this part, we have 
a zero for the sequences with zero mutations in this block and —00 
for the non-zero mutations. Part three is described by a single peak 
fitness landscape with fitness J = 3 for the peak subsequence and 
zero for other subsequences. Thus, the fitness function is defined as 
/(mi, 7712, ma) = fcmf/2 — [1 — S{m2 — 1)] * 00 + JS{m3 — 1), 
where the discrete 5{x) function is equal to 1 at zero and is equal to 
otherwise. The mean fitness is given as fc(l-l/(3fc))V2-f 3-2/3. 



One can calculate the mean fitness R by differentiating the 
function U (mi, . . . , m// ). 

Thus, we defined the mean fitness for the general multi- 
dimensional mean-field like fitness landscape for the evolu- 
tion model with selection and mutation. 

Figure 1 gives the comparison of our analytical result for 
Eq.(|9|l with numerics of the 3-dimensional model. 



B. The Multidimensional Model with Lethal Mutations 

Let us now consider a model where there exists some prob- 
abilities of lethal mutations: the Malthusian fitness r (after 
t period of time, the population without mutation grows e*"* 
times) while in the parallel (Crow-Kimura) model is becom- 
ing -00 m. 

At any piece of the genome, we consider the master sub- 
sequence, having non-lethal L„(l — A) neighbors with single 
mutations, where < A < 1 is a parameter describing the 
fraction of lethal mutations. When the fitness is a function of 
the Hamming distance from the reference sequence, we sim- 
plify the evolution equations using this symmetry. We define 
some mutations from the reference sequence as lethal muta- 
tions and assume that any sequence having at least one lethal 
mutation (plus some non lethal mutations) has a -cxo fitness. 
Therefore at the l-th Hamming class we have 



Lnjl - A„)! 
(L„(l-A„)-0!^! 



viable I point mutants, and as a maximal I, we take -L„(l — 
A„). For a small / ^ L„, there is a dilution of the sequence 



space via a factor (1 — A„), while the total number of viable 
sequence is: 



L„(l-A„) 
1=0 

We define now the fitness function as 

riu-.-dH = Lf{mi, . . . ,mH), 
where instead of Eq.([T]i, we now define 



In — Ln (1 — A„). 



(11) 



(12) 



(13) 



Then the calculation is, identical to those in [3Ci, give 

R = maXrn[fimi,...,mH)-l 
L 



(14) 



C. The Model in Multipeak Fitness Landscape 

We formulated the model by Eq.(3) for a rather general 
case. The multi-peak model, considered in ifisll . could be de- 
rived as a particular case of our solution. 

Let us choose H — 2^^^ and consider K reference se- 
quences with our s" spins, 1 < i < L, 1 < n < H. At 
any position i along the genome, we are looking at the align- 
ment of spins in our K reference sequences. We have cho- 
sen the first configuration with all " + " spins and define the 
alignment of spin along the z-th reference sequence at the ?i-th 
piece of genome as Q!i_„ . We group together the configurations 



a, „ and s'' 



-ain, where ai 



±1 and these two 



cases have a joint probability Ln/ L. The magnetization of the 
i-th sequence Mi is defined through our m„ as: 



H 



Mi = E ^anim 



(15) 



We then take a fitness which is a function of our H reference 
sequences. Thus, we should find the maximum of 

Ln 



F(Mi,...,Mk)-1+ -f^ 



l<n<H 



H 



■ E hi [-Mi + E -^^amimir, 



(16) 



where we introduced the auxiliary variables hi. The maxi- 
mum condition gives 



dF{M^,...,MK) 



9M; 



hiani = 



(17) 
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The last system of equations coincides with the one derived in 
ifisll with the mapping: 



m„ = (18) 

1 + iT,i=l Oln,iHi) 

where Hi are the fields, conjugate to the AU in Eq.dToli of 
ifTsll . A single difference: in iHI] we defined Ln/L for 2^ 
situations (misprints in Eqs.(|9| and (|24] | of ifisll . in which 2^ 
should be replaced by 2^^^ ), instead of 2^~^ in the current 
article. 



D. The 2-dimensional case 

The definition of tlie model . Let us consider the 2 dimen- 
sional case. We have a system of equations: 

dp{li,l2,t) 



dt 



{ri^,i^ - L - LR)p{li,l2,t) 



+ LxXp{h,Li)p{h- I3,l2,t)^ 

/3=±1 

L2Xp{l2,L2)p{ll,l2 - 13, t) 

a<i„<L„ 
We have a HJE for this case: 

du(rni, 7712) 



(19) 



dt 



-1 



du du 
= -H[mi,m2] — , ^ — ), 

iJ(mi,m2;A, A) = f (mi, 1712) 
1 - rur. 



ELn / 1 ^" 2P, 



(20) 



l<n<2 

The mean fitness R is defined through the equations 

Lr. 



l<n<2 



^2 



f [{mi, 1112) 
/2(mi,m2) = 



Li 



nil 



L2 m2 



L ^\ - ml 



(21) 



Tlie two-block model with lethal mutations. In Fig. 2 we 
compare the analytical results with the numerics for the two- 
block model,where one part has the length (L — n) with a 
lethal mutation (all the spin configurations of the block be- 
sides the one have —00 fitness), and the other block has the 
length n and a fitness /(mi) = km\/2. We obtain the mean 
fitness of this model as 



2 k(n + m) n + m 



(22) 



The asymmetric original distribution. We consider the 
original distribution m(0) = 0.6 for the symmetric distribu- 
tion, only considering the 1-d (Crow-Kimura) model: 



fim) 



(23) 




100 



n 



FIG. 2: The mean fitness R versus the length of the first block in 
the 2-d model with a fitness /(mi) = mi for the first block with the 
length n and lethal mutations for the second block, with zero fitness 
for the peak configuration of the second block. The total length of 
the genome is 100. The analytical results are given by the smooth 
line. 
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FIG. 3: The dynamics of m, mi, m2 for the model by Eqs.(19),(24) 
with mi(0) = 1, m2(0) = 0.2, Li = L/2, L2 = L/2. The middle 
line corresponds to the m by Eq.(24) or the Crow-Kimura model by 
Eq.(23) with m(0) = 0.6. 



Later we take the simplest asymmetric distribution, where the 
part Li spins have li minus spins and original narrow distribu- 
tion with mi = 1 — 2I1/L1. Another part has I2 minus spins 
have original narrow distribution around m2 = 1 — 2I2/ L2- 
We consider the model by Eq.(19) with the fitness 



k 



/(mi,m2) = -m~, 

, Li L2. 
m = (mi— + m2 — ) 

1j hi 



(24) 



Fig. 3 gives the results of the dynamics for m, mi,m2- 

The population distribution for the 2-d case. Let us in- 
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TABLE I: Mean fitness for the 2-d model by Eq. (25). Li = L2 = 

L/2, fca = ki 



L 


100 


100 


100 


100 


100 


ki 


4 


8 


4 


4 


3 


Kz 


3 


2 


5 


6 


7 


Rtheor 


7.0312 


9.025 


8.0277 


9.025 


9.025 


Rnum 


7.0315 


9.0251 


8.0280 


9.025 


9.0251 


91+93 

kl +^3 th.e.nr 


1. 


1 


1 


1 


1 


91+93 
^1+^3 n.n.m. 


0.991 


0.984 


0.992 


0.993 


0.994 



tern of equations: 

dp{li, ...,lH,t) 
dt 



= {rh,...,iH - LR)p{li, ...Jh) 



-Lp{li,...,lH,t) 

- ^ LnXf^iL - l3,Ln)pih, ■ ■ ■ Jn - ■ ■ ■ ,lH,t) 



3=±1, n 



+C[( ^ LnXplln.Ln) - l)p(Zl ,...,///, 



2 



vestigate the population distribution. We consider a fitness 

f [111.1,1712) = ^'^Aijiriiirij, 

ij 

Aii = ki,A22 = k2,Ai2 = k3 (25) 
Assuming an ansatz 

^, , TrLJdet{G) , ^<a-|(7|a;>, 
i-^(a;) = cxp[— _L- 



I3=±l,n 

Xp{li,...,ln + l3,...jH,t)], 

where the sum over n extends from 1 to H, and 



1\...Ih 



(31) 



(32) 



u{x) 



2 2 

< x\G\x > 



,x = m — s (26) 



we obtain for the correlation: 

K,j = / dx 

J P{x) 



is the equivalent of surplus or "surface" magnetization. For 
the simple symmetric fitness landscape {K = 1) which has 
one surplus parameter, but now there are H parameters. 

The term —Lp{li...lK,t) describes the mutations of the 
whole genome with a rate 1 per allele; the following line de- 
scribes the mutation. Using a coefficient c, we define the di- 
agonal recombination terms: — c is the total rate of changing 



GuGnj < xiXn >= Gi/27) the given sequence, and L„)^-^^ describes the re- 



Differentiating the HJE Eq.(20) for the steady state by xi , X2, 
and putting pi = 0,P2 = 0, we obtain: 



Aij Sj Gij Sj 



(28) 



For the symmetric fitness case 



All = A22 = ki,Ai2 = fcs 

Gil = G22 ~ <?1, G12 — 33, 



(29) 



si — S2 and Eq.(28) gives: 

fci + fcs = .91 + 53 (30) 
We verified the validity of Eq.(30) by the numerics in Table 1. 



combination event when we replace a spin from our current 
sequence with the same kind of spin from the pool of spins 
at the same position in population. In the second term inside 
"[. . . ]", we define the recombination terms as the change in 
the current configuration: we replace a spin with an opposite 
spin from the spin pool. 

Let us derive the Hamilton-Jacobi equation. We used the 
same anzats, Eq. (|5]l, as before; the simple derivations give: 



— = H{mi, . . . ,mK;si, . . . ,SH]u^, ■ ■ ■ ,Uf^), 
= f{mi, . . .,niH) - f{si, . . . ,sh) - I - c 



In A+ rUn 



L ' 2 



1 — TO 



^ = ±l,l<n<_ff 

■r-^/„ (l + m„)(l + s„) (l-TO„)(l-s„) 

t( 1 + Z 



(1+TO„)(1-S„) 2„' I (l-TO„)(l+5„) _2«' 
r, ^ 4 ^ " 4 



III. RECOMBINATION IN A MULTI-DIMENSIONAL 
FITNESS LANDSCAPE 

A. The Model 

In order to describe the recombination (horizontal gene 
transfer), we follow ll22il23ll . We consider the following sys- 



(33) 



where we denote m„ = ^"^"g^;"'"'*^ ■ The func- 
tion u{mi, . . . ,mH ,t) has the maximum at the point 
(TOi,...,mH) = (si,...,sh). 

We don't see a simple way to calculate the asymptotic so- 
lution of the last equation. 



6 



B. An Approximate Solution of Recombination Dynamics 

Let us consider the dynamics of the initial normal distribu- 
tion. 



P{mi, . . . ,TO_f/,0) 

yir 

2 



exphL^ ^(to/ - s/(0))(m„ - s„(0))]. (34) 



In 

Equation (|34] | describes a narrow distribution around some 
Hamming classes. 

We assume that for some not too large periods of time, we 
have a similar solution, 

P(mi, . . .,mH,t) 

^ cxp[-L^ ^(m/ - si{t)){mn ~ Sn(i))], (35) 

In 

where yin describes the normal distribution. 

We get the following system of equations for dsn (t) / dt us- 
ing our Hamiltonian form Eq.(l33]l 



dsn _ dH{si, . . . , sg; si, . . . , sg; 0, . . . , 0) 

dt dmj 

n ' 

^ dH{si,. ..,sh;si,...,sh;0,...,0) 

Let us prove that the last two terms do not depend on c. From 
the first line we obtain: 

dH{si, . . . , s_f/; si, . . . , s_f/; 0, . . . , 0) 



dm I 

Of (mi, . . .,mH) 



dmi 



For the rest we derive: 



2^ ^s,y,„. 



(37) 



(38) 



Eventually, putting the results of Eqs.([37]i.(|38]l into Eq. 
we derive: 

X^y^""^ = f'nisi, ■■■,sh) -2'^j-Siyin. (39) 

n n 

Thus for the initially narrow distribution of population by 
Eq. ( [34] i and mean-field like fitness landscape, the recom- 
bination does not have any impact on the relaxation dynamics 
for some period of time T. If the number of mutations and 
recombination per genome per replication is in the order of 
1, then we have the following condition for this time period: 
1 «: T < i. 




0.2 



0.0^ 
0.0 



0.1 



0.2 



0.3 



0.4 







FIG. 4: The dynamics of m = 1 — 2n/L, where n is the mean 
number of mutations in the model by Eqs. (40) and (41) with 
/(m) = m^L = 1000. The upper corresponds to the model with- 
out recombination, the middle line to the model with symmetric re- 
combination with the rate c — 1 and the low hne to the asymmetric 
recombination with ci = L5, C2 — 0.5. The time scale is chosen as 
in Eq.(41). For the zero selection case at time period 1 almost all the 
alleles in the genome are mutated. 



the simple case of a one dimensional fitness landscape. 

^ = [in - LR)] Pi + {1 + l)Pi+i + l))Pi-i 

-LMi4)^p.+c4(i-i)P.] 

where ci , C2 describe the recombination rates to the up and 
down directions in Hamming classes and I ^^[Pil- 

Using an ansatz Pi = cxp[Lit(m, t)], we derive the follow- 
ing HJE 

du (l + m)(l-s) (l-m)(l + g) 

Tt ^ 1 ^2 ^ 



2u 



1 + m , 1 — s 



C2] -1 



-2u 



1 — m , 1 + s 



1 



-ci 



2 ' 2 

Now we take u{t) = —y{ra — s{t)Y /2 and get an equation 



(41) 



y J = /'(.) - 2ys{t) - ^^^^(1 - smy. (42) 

We see that the recombination immediately starts to change 
the distribution, see Fig.4 for the illustration. 



C. Asymmetric Recombination. 

The theorem from the previous section is not valid for the 
asymmetric recombination, since we have different recombi- 
nation rates for the allele changes to up and down. Consider 



D. The Recombination Model with Lethal Mutations 

In order to describe the lethal mutations, we consider the 
genome which consists of two parts with the length Li ~ XL 
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and L{1 — A). In the first piece, tliere is only one sequence 
with the fitness 0, and any mutation in this part gives a lethal 
sequence with the — oo fitness. 

We can investigate the situation using our model by 
Eq.(40). Previously we used the mutation rate 1. Now we 
introduce the mutation rate fiQ per nucleotide and c as a re- 
combination rate per nucleotide. 

We just write the equations forp(0, /) = pi, identifying also 
r(0,0 =n: 



dpi 
dt 



riPi - PiPoL 



-. ,1-1 L-l + 1 
+L[M^^Pi+i H J Pi-i) 

+ ^^^ — + — — '^P' 

+c(^ 7^ — Pi-1 + ? 7, — Pi+i). (43) 



where we denoted the length of the genome without lethal 
mutations as Z = L(l — A). While in the previous models 
we took /io = 1, now we write formulas for general fiQ. 
Let us define 

21 ~L 



L ' 
ri = f{'m)L 



(44) 



then we can use the results of ESfl to calculate the mean fit- 
ness. If we define the potential U (m, s): 



U{m, s) ^ f{m) + v/(l^r^2jc^ + 



then the mean fitness of the genome is defined as 



(45) 



max[L(U{m, s) — Lfi^)], 
LR = Lf{s)il-X). 



(46) 



E. The finite population version of tlie model with lethal 
mutations. 

In the case of HIV, there are highly varia ble p arts of the 
genome with about 100 nucleotides 13511 . In OSll the use of 
an evolution model with shorter effective genome length to 
describe the virus evolution in such a case has been suggested; 
later this idea was applied in lf36ll . We assume that the usage of 
an effective genome length is reasonable for the zero epistasis 
case, while in the case of lethal mutations as well, we can not 
use an ordinary model with the short genome length. 

Extending the ideas in [37], we suggest the following finite 
population versions of the model. The genome consists of two 
parts. The first part has a length LA where all the mutations 
are lethal, while the n mutations from the part with the length 
L(l — A) give a mutant with the fitness function r„. The popu- 
lation is described via L — fi viable sequences and the n lethal 



ones, and the total population size N is fixed. We describe the 
population via the number of viruses n; in the /-th Hamming 
class, Q <l < L and n. We have a conserved population size, 

n + Ya^o = ^■ 

During the time period 5t , there are ^i5t{\ — A) non-lethal 
mutations and fiStX lethal mutations. 

We consider the following steps during the evolution: 

a. A birth of 5n new lethal mutants which is a binomial 
random process with the probabihty parameter 5tX and {N — 
n) trials. 

b. A birth of 5ni new viruses in the 1-th class, which is 
a binomial random process with n; trials and a probability 
parameter ri6t. 

c. Forward non-lethal mutations which are described via 
binomial random process with a probability parameter 5t{\ — 

and ni trials. 

Backward non-lethal mutations hi, which described via the 
binomial random process with probability parameter 5t{l — 
A)-^^ and ni trials. 

Thus after these mutation processes, ni ni — fi — hi , 
ni+i = ni+i + ni-i = n/_i + hi. 

d. The dilution of the model, where we reduce the virus 
population via n + X]/^o numbers, uniformly distributed 
via L + 2 classes. 



IV. CONCLUSION 

We formulated and solved the evolution model on the mul- 
tidimensional fitness space, where we considered the genome 
as a collection of several pieces and the total fitness as the 
function of the allele type fractions of the pieces. Such a 
model is more general and more realistic than the multi-point 
fitness landscape, considered in [18]. The numerics confirmed 
our analytical results well. 

We calculated the mean fitness of this model, including the 
case of lethal mutations and found a simple way of deriving 
the results of the multi-peak fitness models. 

We formulated the recombination model in the multidimen- 
sional fitness space. While we could not calculate the mean 
fitness, we derived the Hamiltonian-Jacobi equation for the 
dynamics of the population and deduced an important theo- 
rem about the dynamics. For the initially narrow initial dis- 
tribution and mean-field fitness landscape, the recombination 
does not affect the dynamics of the population for a rather 
long period of times ( see Fig. 2). This theorem is not valid in 
the case of asymmetric recombination. 

We formulated the finite population version of the model 
with lethal mutations. Our results could be applied to model 
virus experiments, prescribing to different parts of the genome 
either lethal mutations or negative or positive selection. For 
example, we can apply our model in the case of the Dengue 
virus, where 95% of the genome is epistasis free while there 
are strong correlations between the gene contributions of the 
reminder 5 % Issll . 

The main open mathematical problem in the investiga- 
tion of multi-dimensional evolution is the calculation of the 
surplus and the distribution around the peak of distribution. 
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While we calculated the mean fitness, we failed to calculate 
the surplus. In classical mechanics, one can easily define the 
ground state energy and the position of the interacting par- 
ticles, looking for the minimum of potential energy. Now, 
for our Hamiltonian by Eqs.(20), the situation is highly non- 
trivial. One should consider the asymptotic solution for the 
characteristics (the solutions of Hamilton equation), looking 
for the steady states. Another problem, which is important for 
applications,is to define the quadratic expansion of the solu- 
tion u(mi , 7712) near the maximum of distribution. Again, the 
situation is highly non-trivial, and different statistical physics 
phases are possible like the phases in |3^1 . While we found 
some relations, Eqs.(28),(30), we failed to find the complete 
solution of distribution. We hope that it is possible to suc- 
ceed using the advanced methods of HJE, to address this open 



problem. 
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